Advanced Persistent Threat (APT) attackers apply multiple sophisticatedmethods to continuously and stealthily steal information from the targetedcloud storage systems and can even induce the storage system to apply aspecific defense strategy and attack it accordingly. In this paper, theinteractions between an APT attacker and a defender allocating their CentralProcessing Units (CPUs) over multiple storage devices in a cloud storage systemare formulated as a Colonel Blotto game. The Nash equilibria (NEs) of the CPUallocation game are derived for both symmetric and asymmetric CPUs between theAPT attacker and the defender to evaluate how the limited CPU resources, thedate storage size and the number of storage devices impact the expected dataprotection level and the utility of the cloud storage system. A CPU allocationscheme based on "hotbooting" policy hill-climbing (PHC) that exploits theexperiences in similar scenarios to initialize the quality values to acceleratethe learning speed is proposed for the defender to achieve the optimal APTdefense performance in the dynamic game without being aware of the APT attackmodel and the data storage model. A hotbooting deep Q-network (DQN)-based CPUallocation scheme further improves the APT detection performance for the casewith a large number of CPUs and storage devices. Simulation results show thatour proposed reinforcement learning based CPU allocation can improve both thedata protection level and the utility of the cloud storage system compared withthe Q-learning based CPU allocation against APTs.
展开▼